Headline extraction based on a combination of uni- and multidocument summarization techniques

نویسندگان

  • Wessel Kraaij
  • Martijn Spitters
  • Anette Hulth
چکیده

The TNO system for multi-document summarisation is based on an extraction approach. For headline generation, we chose to extend our system to extract the most informative topical noun phrase. The cluster topic is defined as the most frequent term occurring in the most salient document sentences. The core of our system is a probabilistic model, which estimates the log-odds of salience based on a number of features including sentence position, sentence length, cue phrases and a language model based content score. The parameters of the model were estimated on annotated training data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multidocument Summarization via Information Extraction

Although recent years has seen increased and successful research efforts in the areas of single -document summarization, multi-document summarization, and information extraction, very few investigations have explored the potential of merging summarization and information extraction techniques. This paper presents and evaluates the initial version of RIPTIDES, a system that combines information ...

متن کامل

Multidocument Summarization with GISTexter

This paper presents the architecture and the multidocument summarization techniques implemented in the GISTEXTER system. The paper presents an algorithm for producing incremental multi-document summaries if extraction templates of good quality are available. An empirical method of generating ad-hoc templates that can be populated with information extracted from texts by automatically acquired e...

متن کامل

Multi-Document Summarization By Sentence Extraction

This paper discusses a text extraction approach to multidocument summarization that builds on single-document summarization methods by using additional, available in-, formation about the document set as a whole and the relationships between the documents. Multi-document summarization differs from single in that the issues of compression, speed, redundancy and passage selection are critical in ...

متن کامل

Detecting Discrepancies in Numeric Estimates Using Multidocument Hypertext Summaries

To aid analysts in detecting discrepancies in numeric estimates in news articles from multiple sources, we propose the automatic generation of hypertext summaries that include a high-level textual overview; tables of all comparable numeric estimates, organized to highlight discrepancies; and targeted access to supporting information from the original articles. The RIPTIDES system, which exempli...

متن کامل

Abstractive Multi-document Summarization by Partial Tree Extraction, Recombination and Linearization

Existing work for abstractive multidocument summarization utilise existing phrase structures directly extracted from input documents to generate summary sentences. These methods can suffer from lack of consistence and coherence in merging phrases. We introduce a novel approach for abstractive multidocument summarization through partial dependency tree extraction, recombination and linearization...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002